Introduction

Cohort studies provide a unique opportunity to test hypotheses that cannot be evaluated through randomized controlled trials. To be effective, they require a substantial and carefully chosen cohort, which typically necessitates time-consuming doctor chart reviews. Some Electronic Health Record (EHR) compatible automation strategies have been designed to speed up this process. However, these are often ineffective for diseases diagnosed through pathology, such as Myelodysplastic Syndrome (MDS), due to the storage of clinical pathology data in free-text reports. In this study, we used a combination of natural-language processing (NLP), International Classification of Diseases (ICD) coding, and Generative Pre-trained Transformer 4 (GPT-4) free-text analysis to sift through a large clinical database and identify patients with pathology-confirmed MDS that met several additional criteria.

Methods

We used the Research Derivative (RD), a clinically annotated database at VUMC, to screen for patients with MDS-related ICD-10 codes, available sequencing data, and hematopathology keywords identified through NLP. Out of 5 million patients, 724 met these criteria. We then used GPT-4 to analyze the pathology reports from their diagnostic bone marrow biopsies. The model provided a certainty score between 0 and 1 for the usability of each report, with 1 representing 100% certainty. The usability of a report was defined by three criteria: it was a bone marrow biopsy report diagnostic of MDS; it showed no evidence of any other hematologic disease; and it did not mention any prior MDS treatment. We used these scores to generate a Receiver Operating Characteristic (ROC) curve, which helped us determine the best threshold for converting usability scores into a simple ‘yes’ or ‘no’. We then compared these results with manual chart reviews for verification.

Results

The area under the ROC was 0.95, with the optimal threshold for classification at a usability score of 0.5. Using this threshold, the GPT model achieved a recall of 0.981, incorrectly excluding only one case out of the 724 patients that met the NLP and ICD10 criteria. The accuracy was 0.756, with the most common reason for inaccuracy being the absence of pretreatment information in the hematopathology report. This was confirmed through manual chart reviews.

Conclusions

The combination of NLP, ICD coding, and GPT-4 analysis proved highly effective in screening a large clinical database for patients with pathology-confirmed MDS, reducing potential study candidates from five million to several hundred. Manual reviews of separate clinical charts revealed treatment status information not listed in the pathology reports. Future automated phenotyping efforts could benefit from GPT-4 screening of contemporaneous clinic notes to enhance precision.

Disclosures

Bick:TenSixteen Bio: Consultancy. Ferrell:Novartis: Research Funding. Kishtagari:Sevier Pharmaceuticals: Consultancy, Membership on an entity's Board of Directors or advisory committees; Syndex: Current equity holder in publicly-traded company; Morphosys: Membership on an entity's Board of Directors or advisory committees; Sobi: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Geron Coporation: Current equity holder in publicly-traded company, Membership on an entity's Board of Directors or advisory committees; Rigel: Membership on an entity's Board of Directors or advisory committees.

This content is only available as a PDF.
Sign in via your Institution